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ABSTRACT 

Wo propose a framework that provides a programming in- 
terface to perform complex dynamic system-level analyses 
of deployed production systems. By leveraging hardware 
support for virtualization available nowadays on all com- 
modity machines, our framework is completely transparent 
to the system under analysis and it guarantees isolation of 
the analysis tools running on its top. Thus, the internals 
of the kernel of the running system needs not to be modi- 
fied and the whole platform runs unaware of the framework. 
Moreover, errors in the analysis tools do not affect the run- 
ning system and the framework. This is accomplished by 
installing a minimalistic virtual machine monitor and mi- 
grating the system, as it runs, into a virtual machine. In 
order to demonstrate the potentials of our framework we 
developed an interactive kernel debugger, nicknamed Hy- 
perDbg. HyperDbg can be used to debug any critical 
kernel component, and even to single step the execution of 
exception and interrupt handlers. 

Categories and Subject Descriptors 

D.2.5 [Software Engineering]: Testing and Debugging — 
Debugging aids, Monitors, Tracing; D.4.9 [Operating Sys- 
tems]: Systems Programs and Utilities 

General Terms 

Verification 

Keywords 

hardware virtualization, debugging, system analysis 

1. INTRODUCTION 

Operating systems are peculiar and very complex pieces 
of software whose internals are critically vital for a system: 
a failure, or a bottleneck, in any of their parts can lead to 
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catastrophic consequences. Therefore, special care is needed 
to develop, analyze, test, and profile them. To simplify their 
task, developers and analysts rely on a large variety of tools 
and analysis techniques. Some of them are specific for study- 
ing static properties of the operating system, while others 
are more specific for studying dynamic properties. In par- 
ticular, the latter class of tools and techniques is nowadays 
very popular among kernel developers and analysts because 
it allows them to collect the information very quickly, while 
hiding many of the intricacies of the kernel, and can even be 
used on running production systems. 

Existing approaches for dynamic analysis of operating sys- 
tems (e.g., debugging, profiling, and tracing) can be roughly 
classified in two groups: kernel-based and VMM-based. The 
approach taken by the first group is to include some com- 
ponent into the kernel in order to intercept all the events of 
interest {e.g., the creation of a new process, the execution 
of a system call, and the execution of a kernel function) and 
to execute a specific action when such events occur [2, 11, 
14, 18, 20]. This solution requires the installation of specific 
hooks in the kernel to monitor run-time events and it might 
be very difficult to apply to operating systems that do not 
natively offer facilities for dynamic analysis, especially when 
the source code is not available. The approach taken by the 
second group is to run the kernel and user-space applica- 
tions in a virtual machine and to intercept, and respond 
to, the events of interest from the virtual machine moni- 
tor (VMM) [9]. Although this approach guarantees trans- 
parency and has a loose dependency on the operating sys- 
tem internals, it cannot be used in all the settings, since it 
implies that the system must be run as a guest of a vir- 
tual machine and production systems not running in virtual 
machines cannot be analyzed. Moreover, VMM-based solu- 
tions typically virtualize hardware devices, to allow multiple 
guests to share the same physical peripherals. This makes 
software virtualization approaches unsuitable to assist the 
analysis of components that need to interact directly with 
the underlying hardware. 

In this paper we propose a framework that brings to- 
gether the advantages of both approaches: it can be used on 
commodity production systems (i.e., ofi-the-shelf products, 
whose source code or debugging symbols are not necessarily 
available), since it does not require to instrument the sys- 
tem under test, and it is able to inspect systems running on 
real hardware, since it does not require an emulation con- 
tainer. Similarly to existing frameworks, the analyses that 
can be built on top of our framework include profiling and 
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tracing of the kernel and user-space applications, interactive 
debugging, or even extension of system features. However, 
differently from existing frameworks, ours is fully dynamic, 
transparent, loosely dependent on the operating system, and 
fault-tolerant with respect to possible defects in the anal- 
ysis code. First, our framework does not require recompi- 
lation or rebooting of the target system. Thus, it can be 
used to analyze any running production system, including 
commodity operating systems lacking native support for in- 
strumentation and systems not running in virtual machines. 
Second, the framework is not invasive, since analyses can be 
performed on a virtually unmodified system: as explained in 
the paper, only a minimal driver needs to be installed and no 
parts of the kernel are patched in any way. Moreover, since 
the framework itself is not accessible from the target system, 
its code cannot be detected by malicious code or unwittingly 
influence buggy operating system components. Thus, the 
infrastructure can be applied to any operating system, as 
the majority of the facilities it supports are completely OS- 
indopendent, and the only OS-dependent functionalities are 
just provided to ease the development of analysis tools. Fi- 
nally, the framework is fault-tolerant, as it guarantees that 
a defect in an analysis tool built on top of it do not damage 
the framework itself nor the analyzed system. 

Our framework leverages hardware extensions for virtual- 
ization available on commodity x86 CPUs [1, 15]. Hardware- 
support for virtualization allows the development of virtual 
machine monitors that are very efficient, completely trans- 
parent, and non invasive to the systems running in the vir- 
tual machine. To overcome the major limitation of tradi- 
tional VMM-based approaches {i.e., the impossibility to an- 
alyze productions systems not running in a virtual machine), 
our framework exploits a feature of the hardware that allows 
to install a virtual machine monitor and to migrate a run- 
ning system into a virtual machine. When the analysis is 
completed, the original mode of operation of the system can 
be restored. Practically speaking, our framework is a mini- 
malistic virtual machine monitor acting as a broker between 
the analyzed system and the analysis tool. The framework 
abstracts low-level events occurring in the analyzed system 
into high-level events and guarantees fault-tolerance by re- 
lying on the hardware to run the analysis tool in a isolated 
execution environment. 

To demonstrate the potentials of our framework we have 
developed an interactive kernel debugger, nicknamed Hy- 
perDbg, constructed entirely using the programming in- 
terface exposed by our infrastructure. HyperDbg adds 
live and interactive debugging support to Microsoft Win- 
dows XP, so far only possible using very invasive tools, like 
Syser [19], or traditional VMM-based debuggers. Hyper- 
Dbg can be used to debug any component of the Win- 
dows kernel, including interrupt/exception handlers, device 
drivers, and even supports single instruction stepping. Be- 
ing completely separated from the debuggee, HyperDbg is 
transparent to the analyzed system and can be even used to 
analyze protected and malicious code. 

In summary, the paper makes the following contributions. 

1. We propose a framework to perform complex dynamic 
system-level analyses of commodity production sys- 
tems. Compared to existing frameworks, the one we 
propose guarantees transparency, efficiency, and does 
not require the target system to be already installed 
on a virtual machine. 



2. We implemented our framework in an experimental 
prototype for Microsoft Windows XP. 

3. We describe the design and the implementation of Hy- 
perDbg, a kernel-level interactive debugger built on 
top our framework. 

Both the analysis framework and HyperDbg are avail- 
able at http://security.dico.unimi.it/hyperdbg/ and is 
released under the terms and conditions of the GPL (v3.0) 
license. 

2. RELATED WORK 

The framework proposed in this paper shares many simi- 
larities with frameworks and techniques extensively explored 
in the past. However, by exploiting recent facilities available 
of modern Intel x86 CPUs, our framework is able to combine 
and to offer simultaneously the main benefits introduced by 
previous research work. 

Dynamic Kernel Instrumentation. 

DTrace is a facility included into the Solaris kernel that al- 
lows the dynamic instrumentation of production systems [2] . 
The key points of DTrace are efficiency and flexibility. First, 
the instrumentation framework itself introduces no over- 
head. Second, the framework provides tens of thousands 
of instrumentation points, and the actions to be taken can 
be expressed in terms of a high-level control language, that 
also includes a number of mechanisms to guarantee run- 
time safety. Similarly, Kernlnst is a dynamic instrumen- 
tation framework for commodity kernels [20]. Kernlnst has 
been developed mainly to gather information about the per- 
formances of a running kernel, but it has also been em- 
ployed for run-time kernel optimization. Differently from 
DTrace, Kernlnst does not provide any mechanism for run- 
time safety of the instrumentation routines. Unfortunately, 
the aforementioned approaches are not transparent, as they 
require direct modiflcations of the operating system kernel, 
achieved by loading a kernel-mode module. Moreover, none 
of them is OS-independent, and they and cannot be applied 
to closed-source operating systems. Our framework does 
not suffer these limitations since it can instrument the ker- 
nel without modifying it and does not rely on any facility 
offered by the kernel. 

Kernel-level Debugging. 

Several efforts have been made to develop efficient and 
reliable kernel-level debuggers. Indeed, these applications 
are essential for many activities, such as the development 
of device drivers. One of the first and most widely used 
kernel-level debuggers that targeted the Microsoft Windows 
operating system was SoftlCE [18], but today the project 
has been discontinued. However, both commercial [19] and 
open-source [17] alternatives to SoftlCE appeared. Mod- 
ern versions of Windows already include a kernel debugging 
subsystem [14]. Unfortunately, to exploit the full capabili- 
ties of Microsoft's debugging infrastructure, the host being 
debugged must be physically linked {e.g., by means of a se- 
rial cable) with another machine. All these approaches share 
a common factor: to debug kernel-level code, they leverage 
another kernel-level module. Obviously, that is like a dog 
chasing its tail. The framework proposed in this paper does 
not require any kernel support nor to modify the kernel to 
add the missing support at run-time. 
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Figure 1: Overview of the framework 



Frameworks Based on Virtual Machines. 

Instead of relying on a kernel-level module to monitor 
other kernel code, an alternative approach consists of run- 
ning the target code inside a virtual machine, and to per- 
form the required analyses from the outside [9]. In [10, 22, 7] 
the authors propose virtual machines with execution replay- 
ing capabilities: a user can move forward and backwards 
through the execution history of the whole system, both 
for debugging and for understanding how a hacker intrusion 
took place. Finally, in [3] Chow et al. propose Aftersight, 
a system that decouples execution recording from execution 
trace analysis, thus reducing the overhead suffered by the 
system where the guest operating system is run. Nowa- 
days, Aftersight is part of the VMwarc platform, and other 
maiustrcam commercial products provide similar capabili- 
ties. The framework proposed in this paper can provide 
these functionalities even on systems not running in any vir- 
tual machine. 

Aspect-oriented Programming. 

Aspect-oriented programming is a paradigm that promises 
to increase modularity by encapsulating cross-cutting con- 
cerns into separated code units, called "aspects", whose "ad- 
vice" code is woven into the system automatically, by speci- 
fying the properties of the join-points. AspcctC is an aspect- 
oriented framework that is used to customize (at compile- 
time) operating system kernels [4, 12, 13]. More dynamic 
approaches have been proposed: for example TOSKANA 
provides before, after and around advices for in-kernel func- 
tions and supports the implementation of aspects themselves 
as dynamically exchangeable kernel modules [8]. The frame- 
work proposed in this paper allows to achieve the same goal 
while being transparent and fault-tolerant. 

3. OVERVIEW OF THE FRAMEWORK 

Figure 1 depicts the architecture of our framework, the 
installation and removal processes, and the migration of the 
operating system and its applications into a virtual machine. 
Our framework consists of a virtual machine monitor (VMM 
for short) that provides a programming interface for the de- 
velopment of system-level analysis tools. As in traditional 
VMM-based analysis approaches, the analysis tool is run 
within the VMM and thus completely transparent to guests 
of the virtual machine. However, compared to traditional 



VMM-based ones, ours does not require the system to be 
already running inside any virtual machine. To achieve this 
goal, our framework leverages hardware extensions for virtu- 
alization available on all modern x86 CPUs [15, 1] (which are 
unused in the majority of the deployments). In short, these 
extensions augment the instruction set architecture with two 
new modes of operation: VMX root mode and VMX non- 
root mode^ . These new modes of operation separate logically 
the virtual machine monitor from a guest without having to 
modify the latter. More precisely, we exploit a particular 
feature of these extensions that allows for late launching of 
VMX modes. Late launching of VMX modes permits to 
install a virtual machine monitor even if the system has al- 
ready been bootstrapped. In other words, late launching 
allows to migrate (temporarily) a running operating system 
in a virtual machine, and to analyze and control the execu- 
tion of the system from the monitor. Through the rest of the 
paper, we use the term "guest" to refer to the system under 
analysis that has been migrated into a virtual machine. 

Practically speaking, the running operating system is not 
migrated anywhere and not touched at all. Rather, by 
launching VMX modes, the execution environment is ex- 
tended with the two aforementioned operating modes; the 
running operating system is then associated with non-root 
mode, while the VMM is associated with root mode. Thus, 
in all respects, the operating system and its applications be- 
come a guest of our special virtual machine. Following the 
same principle, the VMM can be unloaded, and the original 
mode of execution of the operating system restored, by sim- 
ply disabling VMX modes. After the launch of the VMX 
modes, the execution of the guest can continue exactly as 
before, even in terms of interactions with the underlying 
hardware devices. However, during its execution, the guest 
might be interrupted by an exit to root mode. Like hardware 
exceptions, exits are events that block the execution of the 
guest, switch from non-root mode to root mode, and transfer 
the control to the VMM. Differently from exceptions, the set 
of events triggering exits to root mode can be configured dy- 
namically by the VMM. A routine of the VMM handles the 
exit and eventually enters non-root mode to resume the ex- 
ecution of the guest. Being executed at the highest privilege 
level, the routine handling the exit has complete read/ write 
control of the state of the guest system (of both memory 
and CPU registers). 

The framework itself does not perform any analysis. It is 
only responsible for handling a small set of exits to control 
all accesses to the memory management unit of the CPU, to 
prevent the guest from accessing the physical memory loca- 
tions holding the code and the data of the framework. On 
the other hand, the framework provides a flexible API to de- 
velop tools to perform sophisticated analyses of both kernel 
and user code running in the guest. Using the function- 
alities provided through the API, the tool can request the 
framework to monitor certain events that might occur dur- 
ing the execution of the guest; when such events occur, it can 
inspect, and even manipulate, the state of the guest. The 
events that can be monitored include, but are not limited to, 
system call invocations, function calls, context switches and 
I/O operations. Practically speaking, events are monitored 
through exits to root mode. Thus, a request of the analysis 
tool to monitor a certain high-level event {e.g., the execu- 



^VMX (non-) root mode is the terminology used by Intel; 
AMD adopts a different terminology. 
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Figure 2: A close-up of the framework 

tion of a system call) is translated by the API of the frame- 
work into a sequence of low-Ievel operations that guarantee 
that all the occurrences of such event in the guest trigger an 
exit to root mode. Similarly, the framework translates the 
exit into a higher-level event and notifies the occurrence of 
the event to the analysis tool. Once notified, the tool can 
recover information about the event (e.(?., arguments and 
return value of a system call), using the inspection function- 
alities offered by the API. 

An important requirement for the analysis of production 
systems is that analysis tools must not interfere with the 
correct execution of the guest. This is particularly impor- 
tant for faults and deadlocks that might occur in the analysis 
tool. The approach we adopt is to run the tool in a less privi- 
leged execution environment, isolated from the analyzed sys- 
tem and from the framework. The tool can interact with the 
guest only through the API exposed by the framework. This 
approach guarantees the framework the ability to intercept 
any fault occurring in the tool, to mediate all accesses to 
the analyzed system (and to prevent write accesses) , and to 
terminate the tool in case of deadlocks or other anomalous 
situations. 

4. DESIGN AND IMPLEMENTATION 

Figure 2 shows a more detailed view of the architecture 
of our framework. Intuitively, this architecture is very simi- 
lar to that of traditional operating systems: the framework 
plays the role of the kernel and the analysis tool plays the 
role of a user-space application. As will become clear later, 
this architecture prevents buggy analysis tools from compro- 
mising the guest system and the framework. The separation 
between these two parts is made possible by the fact that, 
when VMX is enabled, root and non-root modes offer two 
fully-featured execution environments. Thus, like the guest 
running in non-root mode, the framework running in root 
mode can rely on privilege separation to isolate the analysis 
tool and can handle independently interrupts and exceptions 
that might occur while executing in root mode. 

When an exit to root mode interrupts the execution of 
the guest, the event is delivered to the event gate (step 1 



in Figure 2). The event gate is responsible for abstract- 
ing low-level events into higher-level ones, and to notify the 
analysis tool if the latter has requested to do so (step 2). 
On startup the analysis tool requests the framework to be 
notified of certain events (not shown in the figure). The 
tool can use the API provided by framework to query extra 
information about the event {e.g., the content of the stack 
location storing one of the arguments of a function). Since 
the tool is isolated from the framework, API functions are 
invoked through software interrupts. Thus, requests coming 
from the analysis tool are received by the trap gate (step 
3), then forwarded to the component implementing the API 
(step 4). The tool can perform two types of API calls: (step 
4a) to inspect or manipulate the state of the guest, and (step 
4b) to control event notifications (e.g., enable or disable the 
notification of certain events). Note that the component 
implementing the API is also used by the framework itself 
(step 5) to recover extra information about events {e.g., the 
return address of a function stored in the stack). The trap 
gate also serves the purpose of detecting exceptions {e.g., 
page faults) that might occur during the execution of the 
analysis tool. If the trap gate intercepts an exception (step 

6) , it terminates the faulty tool and unloads the framework, 
to resume the normal operation mode of the system. Finally, 
the trap gate is also used to handle timer interrupts (step 

7) , that, as will be discussed in Section 4.4, are employed to 
enforce a time-bound on the execution of the tool. 

The functionalities provided by the API of the framework 
can be classified into two classes: execution and I/O tracing 
and state inspection and manipulation. The following parar 
graphs describe briefly the API. More details are given in 
Sections 4.2 and 4.3. 

Execution and I/O tracing facilities allow a tool to inter- 
cept the occurrence in the analyzed system of certain events 
and certain I/O operations respectively. Table 1 reports the 
main types of events that can be traced. For each event, the 
table also reports the arguments associated to the event; ar- 
guments are information about the events most commonly 
used in tools. For example, the events FunctionEntry and 
SyscallEntry are used to trace functions and system calls 
respectively. The arguments associated to the FunctionEn- 
try event are the address (or the name) of the function 
called, the caller and the return address. Another exam- 
ple is the ProcessSwitch event that can be used to trace 
context switches between processes (not threads). From the 
point of view of the analysis tool all the events are handled 
in the same way: the tool can subscribe to any event and, 
when the event occurs, can inspect its arguments and take 
the proper actions. However, at the framework-level, certain 
events are different from other ones. Indeed, some of them 
{e.g., context switches between processes) can be traced di- 
rectly by the hardware. That is, the event triggering the exit 
corresponds exactly to the event being traced. Other events 
instead {e.g., function calls and returns) cannot be traced 
directly by the hardware. In all these cases the framework 
relies on other low-level events to trace the execution and 
then abstract exiting low-level events into higher-level ones, 
meaningful for the analysis tool. 

Arguments can optionally be used as conditions, to limit 
the tracing to a subset of all the events. Conditions on events 
serve two purposes. First, conditions allow to simplify the 
analysis tools, since events that do not match the requested 
conditions are discarded by the framework and thus do not 
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Event 



Description 



Arguments 



ProcessSwitch Context (process) switch 

Exception Execution 

Interrupt Hardware or software interrupt 

BreaipointHit Execution breakpoint 

WatchpointHit Watchpoint on data read/write 

FunctionEntry Function call 

FunctionExit Return from function 

SyscallEntry System call invocation 

SyscallExit Return from system call 

lOOperationPort I/O operation throught hardware port 

lOOperationMmap Memory-mapped I/O operation 



Exception vector, faulty instruction, error code 
Interrupt vector, requesting instruction 
Breakpoint address 

Watchpoint address, access type, hitting instruction 
Function name/address, caller/return address 
Function name/address, return address 
System call number, caller/return address 
System call number, return address 
Port number, access type 
Memory address, access type 



Table 1: Events traceable using our framework and corresponding arguments (the argument that represents 
the current process is omitted, eis it is common to all the events) 



need to be handled by the tool. Second, some conditions 
allow preemptive filtering of the events. In other words, 
the framework configures a priori which events trigger an 
exit, instead of filtering out exits caused by uninteresting 
events. For example, in the case of the lOOperationPort 
event, preemptive filtering means to configure the CPU such 
that only I/O operations involving a specific I/O port trigger 
an exit. This feature is very important to minimize the 
number of exits and thus the overall overhead. 

State inspection and manipulation primitives can be used 
by the tool to access the state of the guest, in order to extract 
more detailed information about events or other data use- 
ful for the analysis. For example, these primitives allow to 
extract the arguments of an invoked function, or to inspect 
the internal structures of the guest operating system. Note 
that, by default, write access to guest state is not granted 
to a tool. If necessary, such permission can be enabled at 
compile-time. Obviously, in this case the framework cannot 
protect the state of the guest from dangerous modifications. 

4.1 Framework and Analysis Tool Loading 

The framework and the analysis tool are loaded by a min- 
imal kernel driver. This is unavoidable since the operations 
we need to perform to load the framework require maxi- 
mum privileges and can be performed only by the kernel of 
the operating system. The driver, however, is indeed very 
simple and we put extreme care in avoiding any interference 
with the kernel. Moreover, since once loaded the framework 
is completely invisible to the system, we unload the driver 
immediately as soon as the framework has been installed. 

When VMX modes are enabled, a special VMX data struc- 
ture (VMCS in Intel terminology) is made accessible initially 
to the loader, and subsequently, when the loading is com- 
pleted, only to the framework. This data structure stores the 
host state, guest state, and the execution control fields. The 
host state stores the state of the processor that is loaded 
on exits to root mode, and consists of the state of all the 
registers of the CPU (except for general purpose registers). 
Similarly, the guest state stores the state of the processor 
that is loaded on entries to non-root mode. The guest state 
is updated automatically at every exit, such that the sub- 
sequent entry to non-root mode will resume the execution 
from the same point. The execution control fields allow a 
fine-grained specification of which events should trigger an 
exit to root mode. 

The task of the loader is to enable VMX modes and to 
configure the VMX data structure such that the execution of 
the operating system and user-space applications continue to 



run in non-root mode, while the framework and the analysis 
tool are executed in root mode. Moreover, the loader has 
to configure the CPU such that all the events necessary for 
the tool to trace the execution of the system trigger exits to 
root mode. When the initialization is completed, the driver 
unloads itself and resumes the execution of the system. 

Guest State Configuration. 

The guest state is initialized to the current state of the 
system. In this way, when the virtual machine is launched 
and execution enters non-root mode, the guest operating 
system will resume its execution as if nothing happened. 
A tricky problem when initializing non-root mode concerns 
the management of the memory. More precisely, we must 
prevent the newly created guest to use and access the phys- 
ical memory frames allocated to the framework and to the 
tool. Otherwise, the guest could detect and even corrupt 
the framework. Most recent CPUs provide hardware facil- 
ities for memory virtualization {e.g., Intel Extended Page 
Table extension). If these facilities are not available, mem- 
ory virtualization must be implemented entirely via soft- 
ware. Briefly, software memory virtualization consists of 
intercepting all guest operations to manipulate the page ta- 
ble (the data structure the CPU uses for virtual-to-physical 
address translation) and in ensuring that none of the phys- 
ical frames allocated to the framework and to the analysis 
tool are mapped into the guest. In case the guest tries to 
map a reserved physical frame, the framework assigns the 
guest a different one and masquerades the difference. 

Host State Configuration. 

The host state is initialized as follows. The CPU is config- 
ured to use, when in root mode, a dedicated address space 
and a dedicated interrtipt descriptor table (IDT). This con- 
figuration simplifies the separation of the analyzed system 
from the framework and allows to detect and handle inter- 
rupts and exceptions that occur in root mode. Differently 
from the address of the entry point of non-root mode, which 
is updated at every exit to allow to resume execution of 
the guest from where it was interrupted, the address of the 
entry point of root mode is fixed. The entry point is set 
to the address of the routine that takes care of dispatching 
an exit event to the appropriate handler and that in turn 
might notify the analysis tool {i.e., the entry point of the 
event gate). 

Execution Control Fields Configuration. 

To reduce the run-time overhead suffered by the guest 
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Table 2: Techniques for tracing events 



system, the execution control fields are configured to mini- 
mize the number of events that trigger an exit to root mode. 
When the tool is initialized, it specifies which events must 
be intercepted. Subsequently, in response to the invocation 
of API functions, the configuration of the execution control 
fields can be altered to intercept additional events or to ig- 
nore other ones. 

4.2 Execution Tracing 

Table 2 describes the technique used to trace all the events 
currently supported by the framework. Low-level events 
(those with a mark in the last column) correspond directly 
to exits to root mode (e.g., Exception). Other events are 
traced through the aforementioned ones {e.g., Breakpoint- 
Hit), and otlicrs again are traced through the latter {e.g., 
FunctionEntry). 

Events that can be traced directly through the hardware 
are process switches, exceptions, interrupts, and port-based 
I/O operations. All these events exit conditionally: they 
exit to root mode only when requested and can have op- 
tional exit conditions to limit exits to particular situations. 
The remaining of this section presents how we developed the 
primitives for tracing higher-level events starting from the 
aforementioned low-level ones. 

Breakpoints and watchpoints are two of the most compli- 
cated events to implement. Modern CPUs provide hardware 
facilities to realize efficient and transparent breakpoints and 
watchpoints. Unfortunately, hardware-assisted breakpoints 
and watchpoints are limited in number (only 4) and shared 
between non-root and root mode. Therefore, they cannot 
be used simultaneously by the analyzed system and by the 
framework. The solution we adopt to allow an arbitrary 
number of breakpoints is to use software breakpoints. A 
software breakpoint is a one-byte instruction that triggers a 
breakpoint exception when executed. Software breakpoints 
are enabled by replacing the byte at the address on which 
we want the breakpoint with the aforementioned instruction. 
When the breakpoint is hit, the original byte is restored and 
the event is notified to the tool. If the breakpoint is not per- 
sistent the execution of the system is resumed. Otherwise 
the instruction is emulated and then the breakpoint is set 
again. Clearly, this approach to breakpoints is not trans- 
parent for the analyzed system. However, it is very efficient. 
An alternative and transparent approach is to use the same 
technique we use for watchpoints, as described in the next 
paragraph. Our framework supports both approaches. 

The approach used in our framework to implement soft- 
ware watchpoints is based on protecting the memory loca- 
tions from any access via hardware (or just from write ac- 



cesses, depending on the type of watchpoint), such that any 
access results in an exception [21]. More precisely, since 
the finest level of protection offered by the hardware is at 
the page level, we mark the page containing the address on 
which we want to set the watchpoint as "non-present". Any 
future access to this page will result in a page fault exception 
that will be intercepted by our framework. The framework 
analyzes the exception and checks whether the accessed ad- 
dress corresponds to the address with the watchpoint. If 
the watchpoint is hit, the framework delivers the event to 
the analysis tool, otherwise it emulates the instruction, and 
then resumes the normal execution of the guest. Emula- 
tion is necessary to execute the faulty instruction manually. 
Indeed, to prevent a second fault, the original permission 
of the memory page accessed by the instruction must be 
restored before executing the faulty instruction. After the 
execution of the instruction, the page must be marked again 
as "non-present" to catch future accesses. 

Other higher-level events, such as function and system call 
entries and exits, are traced through breakpoints. When the 
analysis tool requests the framework to monitor a certain 
function, the framework sets a breakpoint on the address of 
the entry point of the function. Later, when a breakpoint is 
hit, the framework checks whether the hit breakpoint cor- 
responds to a function entry point and, if so, it delivers the 
appropriate event {i.e., FunctionEntry) to the analysis tool. 
Function exits, instead, are traced by setting a breakpoint 
on the return address. The framework discovers the return 
address by setting a breakpoint on the function entry and by 
inspecting the stack frame of the function when the break- 
point on the entry point is hit. A similar approach is used 
for tracing system calls entries and exits. 

The approach for tracing function calls and returns just 
described allows to trace specific functions, whose names or 
addresses are supplied by the tool. The tracing of all func- 
tion calls and returns is instead more complicated because 
it is not possible to know a priori the addresses of all func- 
tions' entry points. The solution in this case is to perform 
a static analysis to identify the addresses of all functions' 
entry points {e.g., by recognizing function prologues). This 
feature is still not available in our current implementation of 
the framework. Nevertheless, if needed, the static analysis 
could be performed directly in the tool. The tracing of all 
system calls is instead much easier, since they are all invoked 
through a common gate. The solution we adopt is to put a 
breakpoint on the entry point of the system call gate [6]. 

Beside execution tracing facilities, the framework also ex- 
poses to analysis tools the possibility of intercepting I/O 
operations with hardware peripherals. Software can interact 
with hardware devices through hardware 1/0 ports, or it can 
leverage memory-mapped 1/0. In the first case, VMX allows 
to intercept the operation without any effort: the framework 
simply configures the execution control fields such that all 
the interactions with the specific hardware ports trigger an 
exit to root mode; when such an exit occurs, the frame- 
work notifies the tool by means of a IQOperationPort event. 
However, for performance reasons, modern peripherals typ- 
ically resort to memory-mapped I/O. In this case, read and 
write operations do not involve any hardware port, as they 
are performed directly on memory. To intercept such opera- 
tions we set a watchpoint on the appropriate memory region. 
Thus, when an access to it is detected, the framework deliv- 
ers a lOOperationMmap event to the tool. 
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4.3 State Inspection and Manipulation 

Several situations require to access the state of the guest 
system in order to inspect, and optionally manipulate, both 
the registers of the CPU and the memory. As an example, 
the framework could need to read the return address of a 
function from the stack, to access the parameters of a system 
call from the processor registers, or to insert a breakpoint 
into the address space of a particular process. Similarly, the 
analysis tool might need to extract data from the memory 
of the guest. 

The inspection and manipulation of CPU registers is a 
straightforward activity. These information are saved during 
an exit and restored before an entry. Thus, the inspection 
and manipulation of registers merely consists of reading or 
writing the VMX guest state (or the memory of the frame- 
work, depending on the type of register). 

Inspection and manipulation of memory locations is much 
more complex. When paging is enabled, virtual addresses 
are translated by the hardware into physical addresses ac- 
cording to the content of the page table and direct physical 
addressing is not possible. Each process has its own page 
table; therefore, different processes have different virtual-to- 
physical mappings and a process cannot access the memory 
of the others. The framework is isolated from the guest us- 
ing the same approach and thus it has its own page table 
and its own mapping. Consequently, the framework cannot 
directly access memory locations of guest processes. More- 
over, inspection is complicated by the fact that page tables 
cannot be traversed via software (but only via hardware): 
the page table is a multilevel table and pointers to lower 
levels are physical. To overcome this problem we have de- 
veloped a specific, OS-independent, algorithm that allows to 
access an arbitrary virtual memory location of an arbitrary 
process. The core of the algorithm is a primitive that al- 
lows to access arbitrary physical memory locations. This is 
accomplished by mapping a given physical address p to an 
unused virtual address v in the page table of the framework, 
and subsequently by accessing v. Then, using this primi- 
tive, the algorithm can traverse the page table of a process 
of the guest via software by iteratively mapping the physical 
addresses stored in the table. 

The framework exposes memory inspection and manipula- 
tion facilities, based on the aforementioned algorithm, to the 
analysis tools through two API functions: GuestReadCp, a,- 
n) and GuestWriteCfi, a, data). The former reads n bytes 
starting from virtual address a of process p ; the latter writes 
the content of buffer data into the address space of process 
p, starting from virtual address a. By default, to preserve 
the integrity of the guest, all GuestWrite operations are for- 
bidden. On top of this functions we have built higher-level 
ones that facilitates the extraction of functions' arguments, 
null terminated strings, and to disassemble code. 

4.4 Tool Isolation 

To be able to use our infrastructure on a production sys- 
tem, it is essential to guarantee that any defect in the anal- 
ysis tool will not affect the stability of the analyzed system 
and of the framework. At this aim, the framework controls 

the execution of the analysis tool and, if any anomalous be- 
havior is observed, the whole infrastructure is automatically 
unloaded. 

As we outlined at the beginning of this section, even if 
the analysis tool is executed in VMX root mode, it is still 



constrained into a less privileged execution mode than the 
framework. Thus, any operation the tool performs on the 
guest must be mediated by the framework. This is exactly 
what happens in traditional operating systems: a user-mode 
process cannot access directly the resources of the operating 
system, nor those of other user-mode processes, and any ac- 
tion it performs outside its address space must be mediated 
by the kernel. Similarly in our context, to perform an opera- 
tion on the guest system, the tool must use the programming 
interface offered by the framework. 

In the default configuration, the framework does not al- 
low a tool to access in write-mode to the state of the guest. 
However, there is still the possibility that the execution of 
an instruction of the tool raises an unexpected exception 
{e.g., a page fault on memory access, or a general protec- 
tion fault). When such an event occurs, the framework has 
no way to handle the anomalous situation and to allow the 
tool to continue its execution. The only viable approach that 
also preserves the integrity of the guest system is to termi- 
nate the analysis tool and to remove the framework. At this 
aim, the solution we adopt is to intercept unexpected excep- 
tions through the custom interrupt descriptor table (IDT) 
installed when launching VMX modes. The IDT receives 
the trap, and delivers it to the trap gate that eventually 
unloads the framework. Another problem that might arise 
with a buggy analysis tool is non-termination: if the anal- 
ysis tool entered an infinite loop, the guest system would 
never be resumed. To prevent this problem we added to the 
framework a minimalistic watchdog and set a time limit on 
the execution of the tool. The limit is not on the whole exe- 
cution time of the tool, but rather on the execution time to 
handle an event. Thus, the analysis tool could potentially 
be run forever, but with the guarantee that the execution 
of the analyzed system will be resumed within the specified 
time limit. At this aim, before delivering an event to the 
analysis tool, the framework resets a timer. Then, while 
the tool handles the event, the framework periodically re- 
gains the control of the execution and checks whether the 
time limit has been exceeded. To do that the framework 
registers, in the IDT, a custom interrupt handler to handle 
timer interrupts and programs the interrupt controller to de- 
liver only timer interrupts (that is necessary to prevent the 
framework to consume interrupts for all the other devices). 
Before returning to non-root mode, the framework repro- 
grams the interrupt controller to deliver all the interrupts 
to the analyzed system. 

4.5 OS-dependent Interface 

Our framework provides a general programming interface 
completely independent from the operating system running 
inside the guest. However, in many cases some OS-specific 
facilities can ease the analysis of the guest. As an example, 
the only OS-independent manner to identify a process is by 
means of the base address of its page table (typically stored 
inside the cr3 CPU register). However, it is quite awkward 
to refer to processes using page table base addresses, and it 
is more natural to identify a process through its process id 
(PID) or through the name of the application it executes. 

The OS-dependent interface we provide leverages virtual 
machine introspection techniques [9] to analyze the inter- 
nal structures of the guest operating system to translate 
OS-independent information {e.g., process with page table 
base address OxlScdcOOO) into something more user-friendly 
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GetFuncAddr(n) Return the address of the function n 
GetFuncName(a) Return the name of the function at address a 
GetProcName (p ) Get the name of process with page directory 
base address p 

GetProcPIDCp ) Get the PID of process with page directory 

base address p 

GetProcLibs (p ) Enumerate the dynamically linked libraries 

loaded into process p 
GetProcStackCp ) Get the stack base for process p 
GetProcHeapCp ) Get the heap base for process p 
GetProcList () Enumerate processes 

GetDriverList C) Enumerate device drivers 

Table 3: OS-dependent API 



{e.g., process notepad.exe). Moreover, using debugging 
symbols, the framework allows to resolve symbols' names 
and addresses {e.g., functions and global variables). In this 
way, a tool can ask to interrupt the execution of the guest 
when function NtCreateFile is invoked, instead of referenc- 
ing the function through its address. Similarly, when a func- 
tion is invoked, it is possible to inspect its call-stack and to 
resolve the name of the caller functions and even recover the 
libraries to which the various functions belong to. Some of 
the OS-dependent functionalities provided are summarized 
in Table 3. 

In case the guest operating system is not supported, the 
OS-dependent module is disabled, and only OS-independent 
functionalities are available. Our current implementation 
offers an OS-dependent interface only for the Windows XP 
operating system. 

5. APPLICATIONS 

In this section we present HyperDbg, an interactive ker- 
nel debugger for Microsoft Windows XP we built on top of 
our framework. In our strive to contribute to the open source 
community, we released the code of HyperDbg, along with 
the code of the framework, under the GPL (v3.0) license. 
The code is available at the following address: 

http : / / security . dico . unimi . it/hyperdbg/ 

The section also discusses other possible applications that 
could be constructed using our framework. 

5.1 HyperDbg 

HyperDbg is an interactive kernel debugger we developed 
on top of our analysis framework. It offers all the features 
commonly found in kernel-level debuggers but, being com- 
pletely run in VMX root mode, it is OS-independent and 
grants complete transparency to the guest operating system 
and its applications. The debugger provides a simple graph- 
ical user interface to ease the interaction with the user. This 
interface is activated in two circumstances: (i) when the user 
presses a special hot-key or (ii) when the debugger receives 
the notification for an event that requires the attention of 
the user {e.g., when a breakpoint is hit). From this interface 
the user interacts with the debugger and can perform several 
operations, including setting breakpoints and watchpoints, 
tracing functions and system calls, and inspecting and ma- 
nipulating the state of the guest (since all interactive debug- 
gers allow to modify the state of the debuggee, we decided 
to enable write access to the guest as well). 
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Figure 3: HyperDbg in action 



Figure 3 shows HyperDbg in action . In particular, the 
figure shows the debugger notifying the event that inter- 
rupted the execution of the analyzed system, displaying a 
fragment of the code of the process currently running in the 
analyzed system and displaying a "backtrace" of the function 
calls that are currently active. Additionally, the debugger 
displays information about the status of the registers at the 
time the event occurred (in the case of the figure the event is 
the pressure of the hot-key). To facilitate the analysis, the 
debugger leverages OS-dependent information. For exam- 
ple, the screenshot in Figure 3 shows that the debugger re- 
solved the ID and the name of the process in a MS Windows 
XP guest, by knowing how the process table is managed by 
the operating system. 

It is worth pointing out that HyperDbg can be used to 
debug any piece of code of the guest system, including crit- 
ical components such as the process scheduler, or interrupt 
and exception handlers. Indeed, Figure 3 shows that the 
guest operating system has been stopped while executing 
the PS/2 keyboard/mouse driver (i8042prt . sys). Thanks 
to the fact that the framework on which the debugger is 
built on is completely transparent to the analyzed system, 
the user can use the keyboard to interact with the debug- 
ger even though the keyboard driver of the guest is being 
debugged. 

HyperDbg consists of less than 1600 lines of code: ~25% 
of the code implements the graphical interface, ~23% of 
the code provides the facilities required for keyboard-based 
user interaction, and the remaining ~52% is responsible for 
handling events and for all the other interactions with the 
framework. Note that certain functionalities {e.g., disassem- 
bling a code region) are implemented directly in the frame- 
work since, most likely, they will be used for other types of 
analysis as well. The framework is about four times big- 
ger than the debugger (without considering the disassembly 
module embedded in the framework, as it is based on an 
off-the-shelf disassembler). We believe these numbers are 
very significant. The number of lines of code we had to 
write to implement HyperDbg clearly witnesses that com- 



^The screenshot was taken using our development environ- 
ment based on an Intel x86 emulator supporting extensions 
for virtualization {i.e., BOCHS). 
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plcx analysis tools like an interactive kernel debugger are 
straightforward to implement using our framework. 

The remaining of this section describes how wc used the 
facilities of the framework to implement the user interface 
and the component to receive commands from the user. 

User Interface. 

Although the graphical user interface of the debugger is 
rough, its implementation is very challenging. The reason 
of the complexity is the fact that we cannot rely on any 
high-level graphical facility available in the analyzed system 
to render the interface. Such approach would be too OS- 
depended and not transparent at all. The lack of graphical 
primitives obliged us to interact directly with the video card. 
The video memory is mapped at a fixed address in the guest 
and thus unmodified inspection and manipulation API (i.e., 
GuestRead and GuestWrite) can be used by the debugger 
to render the interface. Note that this approach is not de- 
pendent on the OS nor on the hardware. We developed a 
small video library that provides basic graphical functional- 
ities and translates our requests into data that are written 
directly in the memory of the video card. Before rendering 
the graphical interface to the screen, the debugger backups 
the content of the video memory and restores the content 
right before resuming the execution of the analyzed system. 

User Interaction. 

User interaction is keyboard-based. When in non-root 
mode, the user can switch into HyperDbg by pressing a 
hot-key. Then, in root mode the user can control the de- 
bugger. For these reasons, HyperDbg must be able to in- 
tercept keystrokes both in root and non-root mode. To in- 
tercept keystrokes in non-root mode we monitor all the read 
operations from the hardware I/O port devoted to the key- 
board. In other words, HyperDbg registers to the core for 
all the IQOperationPort events that satisfy the event con- 
dition port =KEYBOARD_PORT && access =read. When such 
operation is detected, HyperDbg checks whether the key 
pressed corresponds to the hot-key that enables the debug- 
ger. If the key pressed matches the hot-key the debugger 
pops up the graphical interface and waits for commands. 
Otherwise, the debugger passes the keystroke to the ana- 
lyzed system such that the latter will continue its execution 
as if the keystroke were read directly from the keyboard. 
Keyboard handling in root mode is done by polling the key- 
board hardware I/O port. Since direct access to I/O ports 
is not permitted to any analysis tool, the debugger relics on 
a API function exported by the framework which mediates 
all accesses to I/O ports and allows (if the permission is 
granted at compile time) certain analysis tools to read data 
from certain I/O ports. 

5.2 Other Possible Uses of the Framework 

HyperDbg demonstrates that our framework is very ver- 
satile and that enables new opportunities for dynamic anal- 
ysis and we will explore in our future research. 

An interesting extension of HyperDbg will be the sup- 
port for kernel-level omniscent debugging. Omniscent de- 
bugging allows developers to inspect the status of their pro- 
grams in past execution instants, in order to detect the cause 
of a failure without the need to run the target program mul- 
tiple times [16]. HyperDbg could be extended to allow a 
user to record and inspect the values a memory location 



stored during the time, and the exceptions and interrupts 
occurred. Such a feature would ease a user to discover when 
a memory location of the kernel gets corrupted and which 
instruction is responsible for the corruption. Moreover, the 
ability to log asynchronous events, such as interrupts, would 
allow to spot defects connected to non-deterministic behav- 
iors of the analyzed system. Our framework already offers 
all the necessary facilities for this kind of debugging: excep- 
tion and interrupts can be traced natively by the framework 
and memory accesses can be traced using watchpoints. 

Another interesting application of our framework will be 
dynamic aspect-oriented programming of operating system 
kernels. As discussed in Section 2, several approaches have 
been proposed to apply AOP to kernels. The main advan- 
tage offered by our framework over the approaches proposed 
so far is that it does not require any modification of the 
source code of the kernel, nor any modification of the image 
in memory of the kernel. Moreover, our framework pro- 
tects the running kernel from defects in the woven code. 
One approach to facilitate the use of such technology would 
be to provide programmers a source-to-source translator, to 
translate aspect oriented code written in languages like As- 
pectC [5] into C code that uses the API offered by our frame- 
work. In particular, the translator would be responsible for 
translating pointcuts into API calls to trace the correspond- 
ing events, using advices as events handlers, and for trans- 
lating all pointer dereferences into calls to inspection API 
to read the memory of the guest. 

6. CONCLUSIONS 

We proposed a framework to perform complex run-time 
analyses of both system- and user-level code on commodity 
production systems. The framework exposes an API that 
eases the development of analysis tools on its top. The ap- 
proach we described leverages hardware extensions for vir- 
tualization available on modern processors to overcome the 
limitations that affect existing approaches for the analysis of 
system-level code. In particular, the solution we proposed 
does not require to recompile or reboot the target system, 
it is not invasive, it is almost completely OS-independent, 
and it guarantees that a defect in an analysis tool cannot 
damage the framework itself nor the analyzed system. To 
demonstrate its potentials, we developed HyperDbg, an in- 
teractive kernel-level debugger for Microsoft Windows XP. 
HyperDbg and the framework have been released as an 
open source package. 
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ABSTRACT 

Wo propose a framework tfiat provides a programming in- 
terface to perform complex dynamic system-level analyses 
of deployed production systems. By leveraging hardware 
support for virtualization available nowadays on all com- 
modity machines, our framework is completely transparent 
to the system under analysis and it guarantees isolation of 
the analysis tools running on its top. Thus, the internals 
of the kernel of the running system needs not to be modi- 
fied and the whole platform runs unaware of the framework. 
Moreover, errors in the analysis tools do not affect the run- 
ning system and the framework. This is accomplished by 
installing a minimalistic virtual machine monitor and mi- 
grating the system, as it runs, into a virtual machine. In 
order to demonstrate the potentials of our framework we 
developed an interactive kernel debugger, nicknamed Hy- 
perDbg. HyperDbg can be used to debug any critical 
kernel component, and even to single step the execution of 
exception and interrupt handlers. 

Categories and Subject Descriptors 

D.2.5 [Software Engineering]: Testing and Debugging — 
Debugging aids, Monitors, Tracing; D.4.9 [Operating Sys- 
tems]: Systems Programs and Utilities 

General Terms 

Verification 

Keywords 

hardware virtualization, debugging, system analysis 

1. INTRODUCTION 

Operating systems are peculiar and very complex pieces 
of software whose internals are critically vital for a system: 
a failure, or a bottleneck, in any of their parts can lead to 
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catastrophic consequences. Therefore, special care is needed 
to develop, analyze, test, and profile them. To simplify their 
task, developers and analysts rely on a large variety of tools 
and analysis techniques. Some of them are specific for study- 
ing static properties of the operating system, while others 
are more specific for studying dynamic properties. In par- 
ticular, the latter class of tools and techniques is nowadays 
very popular among kernel developers and analysts because 
it allows them to collect the information very quickly, while 
hiding many of the intricacies of the kernel, and can even be 
used on running production systems. 

Existing approaches for dynamic analysis of operating sys- 
tems (e.g., debugging, profiling, and tracing) can be roughly 
classified in two groups: kernel-based and VMM-based. The 
approach taken by the first group is to include some com- 
ponent into the kernel in order to intercept all the events of 
interest {e.g., the creation of a new process, the execution of 
a system call, and the execution of a kernel function) and to 
execute a specific action when such events occur [?, ?, ?, ?, 
?]. This solution requires the installation of specific hooks 
in the kernel to monitor run-time events and it might be 
very difficult to apply to operating systems that do not na- 
tively offer facilities for dynamic analysis, especially when 
the source code is not available. The approach taken by 
the second group is to run the kernel and user-space appli- 
cations in a virtual machine and to intercept, and respond 
to, the events of interest from the virtual machine moni- 
tor (VMM) [?]. Although this approach guarantees trans- 
parency and has a loose dependency on the operating sys- 
tem internals, it cannot be used in all the settings, since it 
implies that the system must be run as a guest of a vir- 
tual machine and production systems not running in virtual 
machines cannot be analyzed. Moreover, VMM-based solu- 
tions typically virtualize hardware devices, to allow multiple 
guests to share the same physical peripherals. This makes 
software virtualization approaches unsuitable to assist the 
analysis of components that need to interact directly with 
the underlying hardware. 

In this paper we propose a framework that brings to- 
gether the advantages of both approaches: it can be used on 
commodity production systems (i.e., off-the-shelf products, 
whose source code or debugging symbols are not necessarily 
available), since it does not require to instrument the sys- 
tem under test, and it is able to inspect systems running on 
real hardware, since it does not require an emulation con- 
tainer. Similarly to existing frameworks, the analyses that 
can be built on top of our framework include profiling and 



tracing of the kernel and user-space applications, interactive 
debugging, or even extension of system features. However, 
differently from existing frameworks, ours is fully dynamic, 
transparent, loosely dependent on the operating system, and 
fault-tolerant with respect to possible defects in the anal- 
ysis code. First, our framework does not require recompi- 
lation or rebooting of the target system. Thus, it can be 
used to analyze any running production system, including 
commodity operating systems lacking native support for in- 
strumentation and systems not running in virtual machines. 
Second, the framework is not invasive, since analyses can be 
performed on a virtually unmodified system: as explained in 
the paper, only a minimal driver needs to be installed and no 
parts of the kernel are patched in any way. Moreover, since 
the framework itself is not accessible from the target system, 
its code cannot be detected by malicious code or unwittingly 
influence buggy operating system components. Thus, the 
infrastructure can be applied to any operating system, as 
the majority of the facilities it supports are completely OS- 
indopendent, and the only OS-dependent functionalities are 
just provided to ease the development of analysis tools. Fi- 
nally, the framework is fault-tolerant, as it guarantees that 
a defect in an analysis tool built on top of it do not damage 
the framework itself nor the analyzed system. 

Our framework leverages hardware extensions for virtual- 
ization available on commodity x86 CPUs [?, ?]. Hardware- 
support for virtualization allows the development of virtual 
machine monitors that are very efficient, completely trans- 
parent, and non invasive to the systems running in the vir- 
tual machine. To overcome the major limitation of tradi- 
tional VMM-based approaches {i.e., the impossibility to an- 
alyze productions systems not running in a virtual machine), 
our framework exploits a feature of the hardware that allows 
to install a virtual machine monitor and to migrate a run- 
ning system into a virtual machine. When the analysis is 
completed, the original mode of operation of the system can 
be restored. Practically speaking, our framework is a mini- 
malistic virtual machine monitor acting as a broker between 
the analyzed system and the analysis tool. The framework 
abstracts low-level events occurring in the analyzed system 
into high-level events and guarantees fault-tolerance by re- 
lying on the hardware to run the analysis tool in a isolated 
execution environment. 

To demonstrate the potentials of our framework we have 
developed an interactive kernel debugger, nicknamed Hy- 
perDbg, constructed entirely using the programming in- 
terface exposed by our infrastructure. HyperDbg adds 
live and interactive debugging support to Microsoft Win- 
dows XP, so far only possible using very invasive tools, like 
Syser [?], or traditional VMM-based debuggers. HyperDbg 
can be used to debug any component of the Windows kernel, 
including interrupt /exception handlers, device drivers, and 
even supports single instruction stepping. Being completely 
separated from the debuggee, HyperDbg is transparent to 
the analyzed system and can be even used to analyze pro- 
tected and malicious code. 

In summary, the paper makes the following contributions. 

1. We propose a framework to perform complex dynamic 
system-level analyses of commodity production sys- 
tems. Compared to existing frameworks, the one we 
propose guarantees transparency, efficiency, and does 
not require the target system to be already installed 
on a virtual machine. 



2. We implemented our framework in an experimental 
prototype for Microsoft Windows XP. 

3. We describe the design and the implementation of Hy- 
perDbg, a kernel-level interactive debugger built on 
top our framework. 

Both the analysis framework and HyperDbg are avail- 
able at http://security.dico.unimi.it/hyperdbg/ and is 
released under the terms and conditions of the GPL (v3.0) 
license. 

2. RELATED WORK 

The framework proposed in this paper shares many simi- 
larities with frameworks and techniques extensively explored 
in the past. However, by exploiting recent facilities available 
of modern Intel x86 CPUs, our framework is able to combine 
and to offer simultaneously the main benefits introduced by 
previous research work. 

Dynamic Kernel Instrumentation. 

DTrace is a facility included into the Solaris kernel that al- 
lows the dynamic instrumentation of production systems [?] . 
The key points of DTrace are efficiency and flexibility. First, 
the instrumentation framework itself introduces no over- 
head. Second, the framework provides tens of thousands 
of instrumentation points, and the actions to be taken can 
be expressed in terms of a high-level control language, that 
also includes a number of mechanisms to guarantee run- 
time safety. Similarly, Kernlnst is a dynamic instrumen- 
tation framework for commodity kernels [?]. Kernlnst has 
been developed mainly to gather information about the per- 
formances of a running kernel, but it has also been em- 
ployed for run-time kernel optimization. Differently from 
DTrace, Kernlnst does not provide any mechanism for run- 
time safety of the instrumentation routines. Unfortunately, 
the aforementioned approaches are not transparent, as they 
require direct modiflcations of the operating system kernel, 
achieved by loading a kernel-mode module. Moreover, none 
of them is OS-independent, and they and cannot be applied 
to closed-source operating systems. Our framework does 
not suffer these limitations since it can instrument the ker- 
nel without modifying it and does not rely on any facility 
offered by the kernel. 

Kernel-level Debugging. 

Several efforts have been made to develop efficient and 
reliable kernel-level debuggers. Indeed, these applications 
are essential for many activities, such as the development 
of device drivers. One of the first and most widely used 
kernel-level debuggers that targeted the Microsoft Windows 
operating system was SoftlCE [?], but today the project 
has been discontinued. However, both commercial [?] and 
open-source [?] alternatives to SoftlCE appeared. Mod- 
ern versions of Windows already include a kernel debugging 
subsystem [?] . Unfortunately, to exploit the full capabilities 
of Microsoft's debugging infrastructure, the host being de- 
bugged must be physically linked {e.g., by means of a serial 
cable) with another machine. All these approaches share a 
common factor: to debug kernel-level code, they leverage 
another kernel-level module. Obviously, that is like a dog 
chasing its tail. The framework proposed in this paper does 
not require any kernel support nor to modify the kernel to 
add the missing support at run-time. 
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Figure 1: Overview of the framework 



Frameworks Based on Virtual Machines. 

Instead of relying on a kernel-level module to monitor 
other kernel code, an alternative approach consists of run- 
ning the target code inside a virtual machine, and to per- 
form the required analyses from the outside [?]. In [?, ?, ?] 
the authors propose virtual machines with execution replay- 
ing capabilities: a user can move forward and backwards 
through the execution history of the whole system, both 
for debugging and for understanding how a hacker intrusion 
took place. Finally, in [?] Chow et al. propose Aftersight, 
a system that decouples execution recording from execution 
trace analysis, thus reducing the overhead suffered by the 
system where the guest operating system is run. Nowa- 
days, Aftersight is part of the VMwarc platform, and other 
mainstream commercial products provide similar capabili- 
ties. The framework proposed in this paper can provide 
these functionalities even on systems not running in any vir- 
tual machine. 

Aspect-oriented Programming. 

Aspect-oriented programming is a paradigm that promises 
to increase modularity by encapsulating cross-cutting con- 
cerns into separated code units, called "aspects", whose "ad- 
vice" code is woven into the system automatically, by speci- 
fying the properties of the join-points. AspcctC is an aspect- 
oriented framework that is used to customize (at compile- 
time) operating system kernels [?, ?, ?]. More dynamic ap- 
proaches have been proposed: for example TOSKANA pro- 
vides before, after and around advices for in-kernel functions 
and supports the implementation of aspects themselves as 
dynamically exchangeable kernel modules [?]. The frame- 
work proposed in this paper allows to achieve the same goal 
while being transparent and fault-tolerant. 

3. OVERVIEW OF THE FRAMEWORK 

Figure 1 depicts the architecture of our framework, the 
installation and removal processes, and the migration of the 
operating system and its applications into a virtual machine. 
Our framework consists of a virtual machine monitor (VMM 
for short) that provides a programming interface for the de- 
velopment of system-level analysis tools. As in traditional 
VMM-based analysis approaches, the analysis tool is run 
within the VMM and thus completely transparent to guests 
of the virtual machine. However, compared to traditional 



VMM-based ones, ours does not require the system to be 
already running inside any virtual machine. To achieve this 
goal, our framework leverages hardware extensions for virtu- 
alization available on all modern x86 CPUs [?, ?] (which are 
unused in the majority of the deployments). In short, these 
extensions augment the instruction set architecture with two 
new modes of operation: VMX root mode and VMX non- 
root mode^ . These new modes of operation separate logically 
the virtual machine monitor from a guest without having to 
modify the latter. More precisely, we exploit a particular 
feature of these extensions that allows for late launching of 
VMX modes. Late launching of VMX modes permits to 
install a virtual machine monitor even if the system has al- 
ready been bootstrapped. In other words, late launching 
allows to migrate (temporarily) a running operating system 
in a virtual machine, and to analyze and control the execu- 
tion of the system from the monitor. Through the rest of the 
paper, we use the term "guest" to refer to the system under 
analysis that has been migrated into a virtual machine. 

Practically speaking, the running operating system is not 
migrated anywhere and not touched at all. Rather, by 
launching VMX modes, the execution environment is ex- 
tended with the two aforementioned operating modes; the 
running operating system is then associated with non-root 
mode, while the VMM is associated with root mode. Thus, 
in all respects, the operating system and its applications be- 
come a guest of our special virtual machine. Following the 
same principle, the VMM can be unloaded, and the original 
mode of execution of the operating system restored, by sim- 
ply disabling VMX modes. After the launch of the VMX 
modes, the execution of the guest can continue exactly as 
before, even in terms of interactions with the underlying 
hardware devices. However, during its execution, the guest 
might be interrupted by an exit to root mode. Like hardware 
exceptions, exits are events that block the execution of the 
guest, switch from non-root mode to root mode, and transfer 
the control to the VMM. Differently from exceptions, the set 
of events triggering exits to root mode can be configured dy- 
namically by the VMM. A routine of the VMM handles the 
exit and eventually enters non-root mode to resume the ex- 
ecution of the guest. Being executed at the highest privilege 
level, the routine handling the exit has complete read/ write 
control of the state of the guest system (of both memory 
and CPU registers). 

The framework itself does not perform any analysis. It is 
only responsible for handling a small set of exits to control 
all accesses to the memory management unit of the CPU, to 
prevent the guest from accessing the physical memory loca- 
tions holding the code and the data of the framework. On 
the other hand, the framework provides a flexible API to de- 
velop tools to perform sophisticated analyses of both kernel 
and user code running in the guest. Using the function- 
alities provided through the API, the tool can request the 
framework to monitor certain events that might occur dur- 
ing the execution of the guest; when such events occur, it can 
inspect, and even manipulate, the state of the guest. The 
events that can be monitored include, but are not limited to, 
system call invocations, function calls, context switches and 
I/O operations. Practically speaking, events are monitored 
through exits to root mode. Thus, a request of the analysis 
tool to monitor a certain high-level event {e.g., the execu- 



^VMX (non-) root mode is the terminology used by Intel; 
AMD adopts a different terminology. 
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Figure 2: A close-up of the framework 

tion of a system call) is translated by the API of the frame- 
work into a sequence of low-Ievel operations that guarantee 
that all the occurrences of such event in the guest trigger an 
exit to root mode. Similarly, the framework translates the 
exit into a higher-level event and notifies the occurrence of 
the event to the analysis tool. Once notified, the tool can 
recover information about the event (e.(?., arguments and 
return value of a system call), using the inspection function- 
alities offered by the API. 

An important requirement for the analysis of production 
systems is that analysis tools must not interfere with the 
correct execution of the guest. This is particularly impor- 
tant for faults and deadlocks that might occur in the analysis 
tool. The approach we adopt is to run the tool in a less privi- 
leged execution environment, isolated from the analyzed sys- 
tem and from the framework. The tool can interact with the 
guest only through the API exposed by the framework. This 
approach guarantees the framework the ability to intercept 
any fault occurring in the tool, to mediate all accesses to 
the analyzed system (and to prevent write accesses) , and to 
terminate the tool in case of deadlocks or other anomalous 
situations. 

4. DESIGN AND IMPLEMENTATION 

Figure 2 shows a more detailed view of the architecture 
of our framework. Intuitively, this architecture is very simi- 
lar to that of traditional operating systems: the framework 
plays the role of the kernel and the analysis tool plays the 
role of a user-space application. As will become clear later, 
this architecture prevents buggy analysis tools from compro- 
mising the guest system and the framework. The separation 
between these two parts is made possible by the fact that, 
when VMX is enabled, root and non-root modes offer two 
fully-featured execution environments. Thus, like the guest 
running in non-root mode, the framework running in root 
mode can rely on privilege separation to isolate the analysis 
tool and can handle independently interrupts and exceptions 
that might occur while executing in root mode. 

When an exit to root mode interrupts the execution of 
the guest, the event is delivered to the event gate (step 1 



in Figure 2). The event gate is responsible for abstract- 
ing low-level events into higher-level ones, and to notify the 
analysis tool if the latter has requested to do so (step 2). 
On startup the analysis tool requests the framework to be 
notified of certain events (not shown in the figure). The 
tool can use the API provided by framework to query extra 
information about the event {e.g., the content of the stack 
location storing one of the arguments of a function). Since 
the tool is isolated from the framework, API functions are 
invoked through software interrupts. Thus, requests coming 
from the analysis tool are received by the trap gate (step 
3), then forwarded to the component implementing the API 
(step 4). The tool can perform two types of API calls: (step 
4a) to inspect or manipulate the state of the guest, and (step 
4b) to control event notifications (e.g., enable or disable the 
notification of certain events). Note that the component 
implementing the API is also used by the framework itself 
(step 5) to recover extra information about events {e.g., the 
return address of a function stored in the stack). The trap 
gate also serves the purpose of detecting exceptions {e.g., 
page faults) that might occur during the execution of the 
analysis tool. If the trap gate intercepts an exception (step 

6) , it terminates the faulty tool and unloads the framework, 
to resume the normal operation mode of the system. Finally, 
the trap gate is also used to handle timer interrupts (step 

7) , that, as will be discussed in Section 4.4, are employed to 
enforce a time-bound on the execution of the tool. 

The functionalities provided by the API of the framework 
can be classified into two classes: execution and I/O tracing 
and state inspection and manipulation. The following parar 
graphs describe briefly the API. More details are given in 
Sections 4.2 and 4.3. 

Execution and I/O tracing facilities allow a tool to inter- 
cept the occurrence in the analyzed system of certain events 
and certain I/O operations respectively. Table 1 reports the 
main types of events that can be traced. For each event, the 
table also reports the arguments associated to the event; ar- 
guments are information about the events most commonly 
used in tools. For example, the events FunctionEntry and 
SyscallEntry are used to trace functions and system calls 
respectively. The arguments associated to the FunctionEn- 
try event are the address (or the name) of the function 
called, the caller and the return address. Another exam- 
ple is the ProcessSwitch event that can be used to trace 
context switches between processes (not threads). From the 
point of view of the analysis tool all the events are handled 
in the same way: the tool can subscribe to any event and, 
when the event occurs, can inspect its arguments and take 
the proper actions. However, at the framework-level, certain 
events are different from other ones. Indeed, some of them 
{e.g., context switches between processes) can be traced di- 
rectly by the hardware. That is, the event triggering the exit 
corresponds exactly to the event being traced. Other events 
instead {e.g., function calls and returns) cannot be traced 
directly by the hardware. In all these cases the framework 
relies on other low-level events to trace the execution and 
then abstract exiting low-level events into higher-level ones, 
meaningful for the analysis tool. 

Arguments can optionally be used as conditions, to limit 
the tracing to a subset of all the events. Conditions on events 
serve two purposes. First, conditions allow to simplify the 
analysis tools, since events that do not match the requested 
conditions are discarded by the framework and thus do not 
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Event 



Description 



Arguments 



ProcessSwitch Context (process) switch 

Exception Execution 

Interrupt Hardware or software interrupt 

BreaipointHit Execution breakpoint 

WatchpointHit Watchpoint on data read/write 

FunctionEntry Function call 

FunctionExit Return from function 

SyscallEntry System call invocation 

SyscallExit Return from system call 

lOOperationPort I/O operation throught hardware port 

lOOperationMmap Memory-mapped I/O operation 



Exception vector, faulty instruction, error code 
Interrupt vector, requesting instruction 
Breakpoint address 

Watchpoint address, access type, hitting instruction 
Function name/address, caller/return address 
Function name/address, return address 
System call number, caller/return address 
System call number, return address 
Port number, access type 
Memory address, access type 



Table 1: Events traceable using our framework and corresponding arguments (the argument that represents 
the current process is omitted, eis it is common to all the events) 



need to be handled by the tool. Second, some conditions 
allow preemptive filtering of the events. In other words, 
the framework configures a priori which events trigger an 
exit, instead of filtering out exits caused by uninteresting 
events. For example, in the case of the lOOperationPort 
event, preemptive filtering means to configure the CPU such 
that only I/O operations involving a specific I/O port trigger 
an exit. This feature is very important to minimize the 
number of exits and thus the overall overhead. 

State inspection and manipulation primitives can be used 
by the tool to access the state of the guest, in order to extract 
more detailed information about events or other data use- 
ful for the analysis. For example, these primitives allow to 
extract the arguments of an invoked function, or to inspect 
the internal structures of the guest operating system. Note 
that, by default, write access to guest state is not granted 
to a tool. If necessary, such permission can be enabled at 
compile-time. Obviously, in this case the framework cannot 
protect the state of the guest from dangerous modifications. 

4.1 Framework and Analysis Tool Loading 

The framework and the analysis tool are loaded by a min- 
imal kernel driver. This is unavoidable since the operations 
we need to perform to load the framework require maxi- 
mum privileges and can be performed only by the kernel of 
the operating system. The driver, however, is indeed very 
simple and we put extreme care in avoiding any interference 
with the kernel. Moreover, since once loaded the framework 
is completely invisible to the system, we unload the driver 
immediately as soon as the framework has been installed. 

When VMX modes are enabled, a special VMX data struc- 
ture (VMCS in Intel terminology) is made accessible initially 
to the loader, and subsequently, when the loading is com- 
pleted, only to the framework. This data structure stores the 
host state, guest state, and the execution control fields. The 
host state stores the state of the processor that is loaded 
on exits to root mode, and consists of the state of all the 
registers of the CPU (except for general purpose registers). 
Similarly, the guest state stores the state of the processor 
that is loaded on entries to non-root mode. The guest state 
is updated automatically at every exit, such that the sub- 
sequent entry to non-root mode will resume the execution 
from the same point. The execution control fields allow a 
fine-grained specification of which events should trigger an 
exit to root mode. 

The task of the loader is to enable VMX modes and to 
configure the VMX data structure such that the execution of 
the operating system and user-space applications continue to 



run in non-root mode, while the framework and the analysis 
tool are executed in root mode. Moreover, the loader has 
to configure the CPU such that all the events necessary for 
the tool to trace the execution of the system trigger exits to 
root mode. When the initialization is completed, the driver 
unloads itself and resumes the execution of the system. 

Guest State Configuration. 

The guest state is initialized to the current state of the 
system. In this way, when the virtual machine is launched 
and execution enters non-root mode, the guest operating 
system will resume its execution as if nothing happened. 
A tricky problem when initializing non-root mode concerns 
the management of the memory. More precisely, we must 
prevent the newly created guest to use and access the phys- 
ical memory frames allocated to the framework and to the 
tool. Otherwise, the guest could detect and even corrupt 
the framework. Most recent CPUs provide hardware facil- 
ities for memory virtualization {e.g., Intel Extended Page 
Table extension). If these facilities are not available, mem- 
ory virtualization must be implemented entirely via soft- 
ware. Briefly, software memory virtualization consists of 
intercepting all guest operations to manipulate the page ta- 
ble (the data structure the CPU uses for virtual-to-physical 
address translation) and in ensuring that none of the phys- 
ical frames allocated to the framework and to the analysis 
tool are mapped into the guest. In case the guest tries to 
map a reserved physical frame, the framework assigns the 
guest a different one and masquerades the difference. 

Host State Configuration. 

The host state is initialized as follows. The CPU is config- 
ured to use, when in root mode, a dedicated address space 
and a dedicated interrtipt descriptor table (IDT). This con- 
figuration simplifies the separation of the analyzed system 
from the framework and allows to detect and handle inter- 
rupts and exceptions that occur in root mode. Differently 
from the address of the entry point of non-root mode, which 
is updated at every exit to allow to resume execution of 
the guest from where it was interrupted, the address of the 
entry point of root mode is fixed. The entry point is set 
to the address of the routine that takes care of dispatching 
an exit event to the appropriate handler and that in turn 
might notify the analysis tool {i.e., the entry point of the 
event gate). 

Execution Control Fields Configuration. 

To reduce the run-time overhead suffered by the guest 



5 



Event 


Exit cause 


±11 CLtil V O 

exit 


ProcessSwitch 


Change of page table address 




Exception 


Exception 




Interrupt 


Interrupt 




BrealcpointHit 


Debug except. / Page fault except. 




WatchpointHit 


Page fault except. 




FunctionEntry 


Breakpoint on function entry point 




FunctionExit 


Breakpoint on return address 




SyscallEntry 


Breakpoint on syscall entry point 




SyscallExit 


Breakpoint on return address 




IQOperationPort 


Port read/write 




IDDperationMmap 


Watchpoint on device memory 





Table 2: Techniques for tracing events 



system, the execution control fields are configured to mini- 
mize the number of events that trigger an exit to root mode. 
When the tool is initialized, it specifies which events must 
be intercepted. Subsequently, in response to the invocation 
of API functions, the configuration of the execution control 
fields can be altered to intercept additional events or to ig- 
nore other ones. 

4.2 Execution Tracing 

Table 2 describes the technique used to trace all the events 
currently supported by the framework. Low-level events 
(those with a mark in the last column) correspond directly 
to exits to root mode (e.g., Exception). Other events are 
traced through the aforementioned ones {e.g., Breakpoint- 
Hit), and otlicrs again are traced through the latter {e.g., 
FunctionEntry). 

Events that can be traced directly through the hardware 
are process switches, exceptions, interrupts, and port-based 
I/O operations. All these events exit conditionally: they 
exit to root mode only when requested and can have op- 
tional exit conditions to limit exits to particular situations. 
The remaining of this section presents how we developed the 
primitives for tracing higher-level events starting from the 
aforementioned low-level ones. 

Breakpoints and watchpoints are two of the most compli- 
cated events to implement. Modern CPUs provide hardware 
facilities to realize efficient and transparent breakpoints and 
watchpoints. Unfortunately, hardware-assisted breakpoints 
and watchpoints are limited in number (only 4) and shared 
between non-root and root mode. Therefore, they cannot 
be used simultaneously by the analyzed system and by the 
framework. The solution we adopt to allow an arbitrary 
number of breakpoints is to use software breakpoints. A 
software breakpoint is a one-byte instruction that triggers a 
breakpoint exception when executed. Software breakpoints 
are enabled by replacing the byte at the address on which 
we want the breakpoint with the aforementioned instruction. 
When the breakpoint is hit, the original byte is restored and 
the event is notified to the tool. If the breakpoint is not per- 
sistent the execution of the system is resumed. Otherwise 
the instruction is emulated and then the breakpoint is set 
again. Clearly, this approach to breakpoints is not trans- 
parent for the analyzed system. However, it is very efficient. 
An alternative and transparent approach is to use the same 
technique we use for watchpoints, as described in the next 
paragraph. Our framework supports both approaches. 

The approach used in our framework to implement soft- 
ware watchpoints is based on protecting the memory lo- 
cations from any access via hardware (or just from write 



accesses, depending on the type of watchpoint), such that 
any access results in an exception [?]. More precisely, since 
the finest level of protection offered by the hardware is at 
the page level, we mark the page containing the address on 
which we want to set the watchpoint as "non-present". Any 
future access to this page will result in a page fault exception 
that will be intercepted by our framework. The framework 
analyzes the exception and checks whether the accessed ad- 
dress corresponds to the address with the watchpoint. If 
the watchpoint is hit, the framework delivers the event to 
the analysis tool, otherwise it emulates the instruction, and 
then resumes the normal execution of the guest. Emula- 
tion is necessary to execute the faulty instruction manually. 
Indeed, to prevent a second fault, the original permission 
of the memory page accessed by the instruction must be 
restored before executing the faulty instruction. After the 
execution of the instruction, the page must be marked again 
as "non-present" to catch future accesses. 

Other higher-level events, such as function and system call 
entries and exits, are traced through breakpoints. When the 
analysis tool requests the framework to monitor a certain 
function, the framework sets a breakpoint on the address of 
the entry point of the function. Later, when a breakpoint is 
hit, the framework checks whether the hit breakpoint cor- 
responds to a function entry point and, if so, it delivers the 
appropriate event {i.e., FunctionEntry) to the analysis tool. 
Function exits, instead, are traced by setting a breakpoint 
on the return address. The framework discovers the return 
address by setting a breakpoint on the function entry and by 
inspecting the stack frame of the function when the break- 
point on the entry point is hit. A similar approach is used 
for tracing system calls entries and exits. 

The approach for tracing function calls and returns just 
described allows to trace specific functions, whose names or 
addresses are supplied by the tool. The tracing of all func- 
tion calls and returns is instead more complicated because 
it is not possible to know a priori the addresses of all func- 
tions' entry points. The solution in this case is to perform 
a static analysis to identify the addresses of all functions' 
entry points {e.g., by recognizing function prologues). This 
feature is still not available in our current implementation of 
the framework. Nevertheless, if needed, the static analysis 
could be performed directly in the tool. The tracing of all 
system calls is instead much easier, since they are all invoked 
through a common gate. The solution we adopt is to put a 
breakpoint on the entry point of the system call gate [?]. 

Beside execution tracing facilities, the framework also ex- 
poses to analysis tools the possibility of intercepting I/O 
operations with hardware peripherals. Software can interact 
with hardware devices through hardware 1/0 ports, or it can 
leverage memory-mapped I/O. In the first case, VMX allows 
to intercept the operation without any effort: the framework 
simply configures the execution control fields such that all 
the interactions with the specific hardware ports trigger an 
exit to root mode; when such an exit occurs, the frame- 
work notifies the tool by means of a IQOperationPort event. 
However, for performance reasons, modern peripherals typ- 
ically resort to memory-mapped I/O. In this case, read and 
write operations do not involve any hardware port, as they 
are performed directly on memory. To intercept such opera- 
tions we set a watchpoint on the appropriate memory region. 
Thus, when an access to it is detected, the framework deliv- 
ers a lOOperationMmap event to the tool. 
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4.3 State Inspection and Manipulation 

Several situations require to access the state of the guest 
system in order to inspect, and optionally manipulate, both 
the registers of the CPU and the memory. As an example, 
the framework could need to read the return address of a 
function from the stack, to access the parameters of a system 
call from the processor registers, or to insert a breakpoint 
into the address space of a particular process. Similarly, the 
analysis tool might need to extract data from the memory 
of the guest. 

The inspection and manipulation of CPU registers is a 
straightforward activity. These information are saved during 
an exit and restored before an entry. Thus, the inspection 
and manipulation of registers merely consists of reading or 
writing the VMX guest state (or the memory of the frame- 
work, depending on the type of register). 

Inspection and manipulation of memory locations is much 
more complex. When paging is enabled, virtual addresses 
are translated by the hardware into physical addresses ac- 
cording to the content of the page table and direct physical 
addressing is not possible. Each process has its own page 
table; therefore, different processes have different virtual-to- 
physical mappings and a process cannot access the memory 
of the others. The framework is isolated from the guest us- 
ing the same approach and thus it has its own page table 
and its own mapping. Consequently, the framework cannot 
directly access memory locations of guest processes. More- 
over, inspection is complicated by the fact that page tables 
cannot be traversed via software (but only via hardware): 
the page table is a multilevel table and pointers to lower 
levels are physical. To overcome this problem we have de- 
veloped a specific, OS-independent, algorithm that allows to 
access an arbitrary virtual memory location of an arbitrary 
process. The core of the algorithm is a primitive that al- 
lows to access arbitrary physical memory locations. This is 
accomplished by mapping a given physical address p to an 
unused virtual address v in the page table of the framework, 
and subsequently by accessing v. Then, using this primi- 
tive, the algorithm can traverse the page table of a process 
of the guest via software by iteratively mapping the physical 
addresses stored in the table. 

The framework exposes memory inspection and manipula- 
tion facilities, based on the aforementioned algorithm, to the 
analysis tools through two API functions: GuestReadCp, a,- 
n) and GuestWriteCfi, a, data). The former reads n bytes 
starting from virtual address a of process p ; the latter writes 
the content of buffer data into the address space of process 
p, starting from virtual address a. By default, to preserve 
the integrity of the guest, all GuestWrite operations are for- 
bidden. On top of this functions we have built higher-level 
ones that facilitates the extraction of functions' arguments, 
null terminated strings, and to disassemble code. 

4.4 Tool Isolation 

To be able to use our infrastructure on a production sys- 
tem, it is essential to guarantee that any defect in the anal- 
ysis tool will not affect the stability of the analyzed system 
and of the framework. At this aim, the framework controls 

the execution of the analysis tool and, if any anomalous be- 
havior is observed, the whole infrastructure is automatically 
unloaded. 

As we outlined at the beginning of this section, even if 
the analysis tool is executed in VMX root mode, it is still 



constrained into a less privileged execution mode than the 
framework. Thus, any operation the tool performs on the 
guest must be mediated by the framework. This is exactly 
what happens in traditional operating systems: a user-mode 
process cannot access directly the resources of the operating 
system, nor those of other user-mode processes, and any ac- 
tion it performs outside its address space must be mediated 
by the kernel. Similarly in our context, to perform an opera- 
tion on the guest system, the tool must use the programming 
interface offered by the framework. 

In the default configuration, the framework does not al- 
low a tool to access in write-mode to the state of the guest. 
However, there is still the possibility that the execution of 
an instruction of the tool raises an unexpected exception 
{e.g., a page fault on memory access, or a general protec- 
tion fault). When such an event occurs, the framework has 
no way to handle the anomalous situation and to allow the 
tool to continue its execution. The only viable approach that 
also preserves the integrity of the guest system is to termi- 
nate the analysis tool and to remove the framework. At this 
aim, the solution we adopt is to intercept unexpected excep- 
tions through the custom interrupt descriptor table (IDT) 
installed when launching VMX modes. The IDT receives 
the trap, and delivers it to the trap gate that eventually 
unloads the framework. Another problem that might arise 
with a buggy analysis tool is non-termination: if the anal- 
ysis tool entered an infinite loop, the guest system would 
never be resumed. To prevent this problem we added to the 
framework a minimalistic watchdog and set a time limit on 
the execution of the tool. The limit is not on the whole exe- 
cution time of the tool, but rather on the execution time to 
handle an event. Thus, the analysis tool could potentially 
be run forever, but with the guarantee that the execution 
of the analyzed system will be resumed within the specified 
time limit. At this aim, before delivering an event to the 
analysis tool, the framework resets a timer. Then, while 
the tool handles the event, the framework periodically re- 
gains the control of the execution and checks whether the 
time limit has been exceeded. To do that the framework 
registers, in the IDT, a custom interrupt handler to handle 
timer interrupts and programs the interrupt controller to de- 
liver only timer interrupts (that is necessary to prevent the 
framework to consume interrupts for all the other devices). 
Before returning to non-root mode, the framework repro- 
grams the interrupt controller to deliver all the interrupts 
to the analyzed system. 

4.5 OS-dependent Interface 

Our framework provides a general programming interface 
completely independent from the operating system running 
inside the guest. However, in many cases some OS-specific 
facilities can ease the analysis of the guest. As an example, 
the only OS-independent manner to identify a process is by 
means of the base address of its page table (typically stored 
inside the cr3 CPU register). However, it is quite awkward 
to refer to processes using page table base addresses, and it 
is more natural to identify a process through its process id 
(PID) or through the name of the application it executes. 

The OS-dependent interface we provide leverages virtual 
machine introspection techniques [?] to analyze the inter- 
nal structures of the guest operating system to translate 
OS-independent information {e.g., process with page table 
base address OxlScdcOOO) into something more user-friendly 
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GetFuncAddr(n) Return the address of the function n 
GetFuncName(a) Return the name of the function at address a 
GetProcName (p ) Get the name of process with page directory 
base address p 

GetProcPIDCp ) Get the PID of process with page directory 

base address p 

GetProcLibs (p ) Enumerate the dynamically linked libraries 

loaded into process p 
GetProcStackCp ) Get the stack base for process p 
GetProcHeapCp ) Get the heap base for process p 
GetProcList () Enumerate processes 

GetDriverList C) Enumerate device drivers 

Table 3: OS-dependent API 



{e.g., process notepad.exe). Moreover, using debugging 
symbols, the framework allows to resolve symbols' names 
and addresses {e.g., functions and global variables). In this 
way, a tool can ask to interrupt the execution of the guest 
when function NtCreateFile is invoked, instead of referenc- 
ing the function through its address. Similarly, when a func- 
tion is invoked, it is possible to inspect its call-stack and to 
resolve the name of the caller functions and even recover the 
libraries to which the various functions belong to. Some of 
the OS-dependent functionalities provided are summarized 
in Table 3. 

In case the guest operating system is not supported, the 
OS-dependent module is disabled, and only OS-independent 
functionalities are available. Our current implementation 
offers an OS-dependent interface only for the Windows XP 
operating system. 

5. APPLICATIONS 

In this section we present HyperDbg, an interactive ker- 
nel debugger for Microsoft Windows XP we built on top of 
our framework. In our strive to contribute to the open source 
community, we released the code of HyperDbg, along with 
the code of the framework, under the GPL (v3.0) license. 
The code is available at the following address: 

http : / / security . dico . unimi . it/hyperdbg/ 

The section also discusses other possible applications that 
could be constructed using our framework. 

5.1 HyperDbg 

HyperDbg is an interactive kernel debugger we developed 
on top of our analysis framework. It offers all the features 
commonly found in kernel-level debuggers but, being com- 
pletely run in VMX root mode, it is OS-independent and 
grants complete transparency to the guest operating system 
and its applications. The debugger provides a simple graph- 
ical user interface to ease the interaction with the user. This 
interface is activated in two circumstances: (i) when the user 
presses a special hot-key or (ii) when the debugger receives 
the notification for an event that requires the attention of 
the user {e.g., when a breakpoint is hit). From this interface 
the user interacts with the debugger and can perform several 
operations, including setting breakpoints and watchpoints, 
tracing functions and system calls, and inspecting and ma- 
nipulating the state of the guest (since all interactive debug- 
gers allow to modify the state of the debuggee, we decided 
to enable write access to the guest as well). 
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Figure 3: HyperDbg in action 



Figure 3 shows HyperDbg in action . In particular, the 
figure shows the debugger notifying the event that inter- 
rupted the execution of the analyzed system, displaying a 
fragment of the code of the process currently running in the 
analyzed system and displaying a "backtrace" of the function 
calls that are currently active. Additionally, the debugger 
displays information about the status of the registers at the 
time the event occurred (in the case of the figure the event is 
the pressure of the hot-key). To facilitate the analysis, the 
debugger leverages OS-dependent information. For exam- 
ple, the screenshot in Figure 3 shows that the debugger re- 
solved the ID and the name of the process in a MS Windows 
XP guest, by knowing how the process table is managed by 
the operating system. 

It is worth pointing out that HyperDbg can be used to 
debug any piece of code of the guest system, including crit- 
ical components such as the process scheduler, or interrupt 
and exception handlers. Indeed, Figure 3 shows that the 
guest operating system has been stopped while executing 
the PS/2 keyboard/mouse driver (i8042prt . sys). Thanks 
to the fact that the framework on which the debugger is 
built on is completely transparent to the analyzed system, 
the user can use the keyboard to interact with the debug- 
ger even though the keyboard driver of the guest is being 
debugged. 

HyperDbg consists of less than 1600 lines of code: ~25% 
of the code implements the graphical interface, ~23% of 
the code provides the facilities required for keyboard-based 
user interaction, and the remaining ~52% is responsible for 
handling events and for all the other interactions with the 
framework. Note that certain functionalities {e.g., disassem- 
bling a code region) are implemented directly in the frame- 
work since, most likely, they will be used for other types of 
analysis as well. The framework is about four times big- 
ger than the debugger (without considering the disassembly 
module embedded in the framework, as it is based on an 
off-the-shelf disassembler). We believe these numbers are 
very significant. The number of lines of code we had to 
write to implement HyperDbg clearly witnesses that com- 



^The screenshot was taken using our development environ- 
ment based on an Intel x86 emulator supporting extensions 
for virtualization {i.e., BOCHS). 
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plcx analysis tools like an interactive kernel debugger are 
straightforward to implement using our framework. 

The remaining of this section describes how wc used the 
facilities of the framework to implement the user interface 
and the component to receive commands from the user. 

User Interface. 

Although the graphical user interface of the debugger is 
rough, its implementation is very challenging. The reason 
of the complexity is the fact that we cannot rely on any 
high-level graphical facility available in the analyzed system 
to render the interface. Such approach would be too OS- 
depended and not transparent at all. The lack of graphical 
primitives obliged us to interact directly with the video card. 
The video memory is mapped at a fixed address in the guest 
and thus unmodified inspection and manipulation API (i.e., 
GuestRead and GuestWrite) can be used by the debugger 
to render the interface. Note that this approach is not de- 
pendent on the OS nor on the hardware. We developed a 
small video library that provides basic graphical functional- 
ities and translates our requests into data that are written 
directly in the memory of the video card. Before rendering 
the graphical interface to the screen, the debugger backups 
the content of the video memory and restores the content 
right before resuming the execution of the analyzed system. 

User Interaction. 

User interaction is keyboard-based. When in non-root 
mode, the user can switch into HyperDbg by pressing a 
hot-key. Then, in root mode the user can control the de- 
bugger. For these reasons, HyperDbg must be able to in- 
tercept keystrokes both in root and non-root mode. To in- 
tercept keystrokes in non-root mode we monitor all the read 
operations from the hardware I/O port devoted to the key- 
board. In other words, HyperDbg registers to the core for 
all the IQOperationPort events that satisfy the event con- 
dition port =KEYBOARD_PORT && access =read. When such 
operation is detected, HyperDbg checks whether the key 
pressed corresponds to the hot-key that enables the debug- 
ger. If the key pressed matches the hot-key the debugger 
pops up the graphical interface and waits for commands. 
Otherwise, the debugger passes the keystroke to the ana- 
lyzed system such that the latter will continue its execution 
as if the keystroke were read directly from the keyboard. 
Keyboard handling in root mode is done by polling the key- 
board hardware I/O port. Since direct access to I/O ports 
is not permitted to any analysis tool, the debugger relics on 
a API function exported by the framework which mediates 
all accesses to I/O ports and allows (if the permission is 
granted at compile time) certain analysis tools to read data 
from certain I/O ports. 

5.2 Other Possible Uses of the Framework 

HyperDbg demonstrates that our framework is very ver- 
satile and that enables new opportunities for dynamic anal- 
ysis and we will explore in our future research. 

An interesting extension of HyperDbg will be the sup- 
port for kernel-level omniscent debugging. Omniscent de- 
bugging allows developers to inspect the status of their pro- 
grams in past execution instants, in order to detect the cause 
of a failure without the need to run the target program rrml- 
tiple times [?]. HyperDbg could be extended to allow a 
user to record and inspect the values a memory location 



stored during the time, and the exceptions and interrupts 
occurred. Such a feature would ease a user to discover when 
a memory location of the kernel gets corrupted and which 
instruction is responsible for the corruption. Moreover, the 
ability to log asynchronous events, such as interrupts, would 
allow to spot defects connected to non-deterministic behav- 
iors of the analyzed system. Our framework already offers 
all the necessary facilities for this kind of debugging: excep- 
tion and interrupts can be traced natively by the framework 
and memory accesses can be traced using watchpoints. 

Another interesting application of our framework will be 
dynamic aspect-oriented programming of operating system 
kernels. As discussed in Section 2, several approaches have 
been proposed to apply AOP to kernels. The main advan- 
tage offered by our framework over the approaches proposed 
so far is that it does not require any modification of the 
source code of the kernel, nor any modification of the image 
in memory of the kernel. Moreover, our framework pro- 
tects the running kernel from defects in the woven code. 
One approach to facilitate the use of such technology would 
be to provide programmers a source-to-source translator, to 
translate aspect oriented code written in languages like As- 
pectC [?] into C code that uses the API offered by our 
framework. In particular, the translator would be respon- 
sible for translating pointcuts into API calls to trace the 
corresponding events, using advices as events handlers, and 
for translating all pointer dereferences into calls to inspec- 
tion API to read the memory of the guest. 

6. CONCLUSIONS 

We proposed a framework to perform complex run-time 
analyses of both system- and user-level code on commodity 
production systems. The framework exposes an API that 
eases the development of analysis tools on its top. The ap- 
proach we described leverages hardware extensions for vir- 
tualization available on modern processors to overcome the 
limitations that affect existing approaches for the analysis of 
system-level code. In particular, the solution we proposed 
does not require to recompile or reboot the target system, 
it is not invasive, it is almost completely OS-independent, 
and it guarantees that a defect in an analysis tool cannot 
damage the framework itself nor the analyzed system. To 
demonstrate its potentials, we developed HyperDbg, an in- 
teractive kernel-level debugger for Microsoft Windows XP. 
HyperDbg and the framework have been released as an 
open source package. 
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